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Abstract 

We consider a two-component mixture model with one known component. We 
develop methods for estimating the mixing proportion and the other unknown 
distribution nonparametrically, given i.i.d. data from the mixture model. We 
use ideas from shape restricted function estimation and develop "tuning param- 
eter free" estimators that are easily implementable and have good finite sample 
performance. We establish the consistency of our procedures. Distribution-free 
finite sample lower confidence bounds are developed for the mixing proportion. 
The identifiability of the model, and the estimation of the density of the un- 
known mixing distribution are also addressed. We discuss the connection with 
the problem of multiple testing and compare our procedure with some of the 
existing methods in that area through simulation studies. We also analyse two 
data sets, one arising from an application in astronomy and the other from a 
microarray experiment. 

Keywords: Cramer-von Mises statistic, identifiability, local false discovery rate, 
lower bound, microarray experiment, shape restricted function estimation. 



1 Introduction 

We consider a mixture model with two components, i.e., 

F{x) = aFs{x) + (1 - a)Fb{x) (1.1) 

where the cumulative distribution function (CDF) is known, but the mixing pro- 
portion a G (0, 1) and the CDF Fg Ff,) are unknown. Given a random sample 
from F, we wish to (nonparametrically) estimate Fg and the parameter a. 
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This model appears in many contexts. In multiple testing problems (microar- 
ray analysis, neuroimaging) the p-values, obtained from the numerous (independent) 
hypotheses tests, are uniformly distributed on [0,1], under Hq, while their distribu- 
tion associated with Hi is unknown; see e.g., [EfrlOj and |RBHDP07j . Translated 
to the setting of (11.11) . is the uniform distribution and the goal is to estimate the 
proportion of false null hypotheses a and the distribution of the p-values under the 
alternative. In addition, a reliable estimate of a is important when we want to assess 
or control multiple error rates, such as the false discovery rate (FDR) of |BH95] . We 
discuss this problem in more detail in Section [31 

In contamination problems the distribution Ff,, for which reasonable assumptions 
can be made, maybe contaminated by an arbitrary distribution Fg, yielding a sample 
drawn from F as in (11.11) : see e.g., [MPOOj . For example, in astronomy, such situations 
arise quite often: when observing some variable(s) of interest (e.g., metallicity, radial 
velocity) of stars in a distant galaxy, foreground stars from the Milky Way, in the 
field of view, contaminate the sample; the galaxy ("signal") stars can be difficult 
to distinguish from the foreground stars as we can only observe the stereographic 
projections and not the three dimensional positions of the stars (see |WMO"'"09j ). 
Known physical models for the foreground stars help us constrain Fh, and the focus 
is on estimating the distribution of the variable for the signal stars, i.e., Fg. This 
problem also arises in High Energy Physics where often the signature of new physics 
is evidence of a significant-looking peak at some position on top of a rather smooth 
background distribution; see e.g., |Lyo08| . 

In this paper we provide a methodology to estimate a and Fg (nonparametrically) , 
without assuming any constraint on the form of Fg. We also develop a finite sample 
honest lower confidence bound for a that is distribution- free (i.e., it does not depend 
on the particular choice of F^ and Fg). We also propose a nonparametric estimator 
of fg, the density of Fg, when fg is assumed to be non-increasing. Our procedure is 
completely automated (i.e., tuning parameter free) and easily implementable. We also 
establish the consistency of the proposed estimators. To the best of our knowledge 
this is the first attempt to nonparametrically estimate the CDF Fg under no further 
assumptions. 

Most of the previous work on this problem assume some constraint on the form of 
the unknown distribution Fg, e.g., it is commonly assumed the distributions belong 
to certain parametric models, which lead to techniques based on maximum likelihood 
(see e.g., |Coh67] and |Lin83] ). minimum chi-square (see e.g., |Day69| ), method of mo- 
ments (see e.g., [LB93j ) and moment generating functions (see e.g., |QR78| ). |BMV06] 
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assume that both the components belong to an unknown symmetric location-shift 
family. In the multiple testing setup, this problem has been addressed by various 
authors and different estimators and confidence bounds for a have been proposed 
in the literature under suitable assumptions on Fg and its density, see e.g., |Sto02] . 
|GW04] . |MR06j . |MB05j . |CR10j and |LLF05j . For the sake of brevity, we do not 
discuss the above references here but come back to this application in Section [31 

The paper is organized as follows. In Section [2] we propose estimators of a, Fg and 
fs and investigate their theoretical properties. We also address the identifiability of 
the model and develop lower bounds for a. Connection to the multiple testing problem 
is developed in Section [31 In Section [H we compare the finite sample performance of 
our procedures with other estimators available in the literature through simulation 
studies. Two real data examples, one arising in astronomy and the other from a 
microarray experiment, are analysed in Section [51 We conclude with a brief discussion 
of our procedure and some open questions in Section [61 The Appendix gives the proofs 
of the numerous results stated in the paper. 



2 Estimation 

2.1 When a is known 

Suppose that we observe an i.i.d. sample Xi,X2,...,X„ from F as in ( 11. ip . If 
a e (0, 1) were known, a naive estimator of Fs would be 

where F„ is the empirical CDF of the observed sample, i.e., F„(x) = X]r=i — 
Although this estimator is consistent, it does not satisfy the basic requirements of a 
DF: Ff^ need not be non-decreasing or lie between and 1. This naive estimator can 
be improved by imposing the known shape constraint of monotonicity. This can be 
accomplished by minimizing 

/ {W{x) - F:^„ix)r dW^ix) ^ - - t,ni^^)y (2-2) 

^ i=l 

over all DFs W. Let F"^ be a DF that minimises (12. 2p . The above optimization 
problem is the same as minimizing \\0 ~ Vp over = {6i, . . . , On) G 0j„c where 

Qinc = G M" : < ^1 < ^2 < • • • < < 1}, (2.3) 
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V = {Vi, V2, . . . , Vn), Vi := F°^(X(j)), i = 1, 2, . . . , n, X(j) being the i-th order statistic 
of the sample, and || • || denotes the usual Euclidean norm in M". The estimator is 
uniquely defined by the projection theorem (see e.g., Proposition 2.2.1 in page 88 of 
|Ber03] ): it is the L2 projection of V on a closed convex cone in M". 6 is related to 
F"^ via F"„(X(j)) = 9i, and can be easily computed using the pool- adjacent- violators 
algorithm (PAVA); see Section 1.2 of |RWD88] . Thus, is uniquely defined at the 
data points Xj, for all i = 1, . . . ,n, and can be defined on the entire real line by 
extending it in a piece-wise constant fashion that is right continuous with possible 
jumps only at the data points. The following result, derived easily from Chapter 1 of 
|RWD88j . characterizes 

Lemma 2.1. Let he the isotonic regression (see e.g., page 4 of Chapter 1 of 
IRWDSSf ) of the set of points {F^n{-^ii))}i=i- Then F^^ is characterized as the right- 
hand slope of the greatest convex minorant of the set of points {i/n, Yl]=o -^s"n(^(j))}i"=o 
The restriction of F^^ to [0, 1], i.e., 

F°„ = min{max{F,%0},l}, 

minimises 1(2. ^)) over all DFs. 

Isotonic regression and the PAVA are very well studied in the statistical literature 
with many text-book length treatments; see e.g., |RWD88j and |BBBB72] . If skilfully 
implemented, PAVA has a computational complexity of 0{n) (see |GW84] ). 

2.2 Identifiability of F, 

When a is unknown, the problem is considerably harder; in fact, it is non-identifiable. 
If (II. ip holds for some F^ and a then the mixture model can be re-written as 

F= (« + 7) i^—Fs + ^^F^ +(l_a_^)F,, 
\q; + 7 a + 7 / 

for < 7 < 1 — a, and the term {aFg + 7Ff,) /(a -|- 7) can be thought of as the 
nonparametric component. A trivial solution occurs when we take + 7 = 1, in 
which case (12. 2 p is minimised when W = ¥n. Hence, a is not uniquely defined. To 
handle the identifiability issue, we redefine the mixing proportion as 

ao := inf |7 G (0, 1) : ^ ~ ^ ^^^^ is a valid Dpj . (2.4) 

Intuitively, this definition makes sure that the "signal" distribution Fg does not in- 
clude any contribution from the known "background" Fi,. In this paper we consider 
the estimation of ao as defined in (12. 4p . 
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Suppose that we start with a fixed Fs,Fb and a satisfying (II. ip . As seen from 
the above discussion we can only hope to estimate aQ, which, from its definition in 
fl2.4p . is smaller than a, i.e., ao < a. A natural question that arises now is: under 
what condition(s) can we guarantee that the problem is identifiable, i.e., ao = a? The 
following result provides an answer and is proved in the Appendix. 

Lemma 2.2. Suppose that Fg and Ft, are absolutely continuous, i.e., they have den- 
sities fs and fb, respectively. Then ao < a if and only if there exists c > such that 
fs{x) > cfb{x), for allx eR. 

The above lemma states that if there does not exist any c > for which fs{x) > 
cfb{x), for all X G M, then ao = a and we can estimate the mixing proportion correctly. 
Note that, in particular, if the support of Fg is strictly contained in that of Fb, then 
the problem is identifiable and we can estimate a. As in [GW04] . we define any 
distribution G to be pure if essinf^gK (^(t) = 0, where g is the density corresponding 
to G and essinftgK^f = inf{a G M : fi{{x : g{x) > a}) = 0}, fi being the Lebesgue 
measure. They proved that purity of Fs is a sufficient condition for identifiability of 
the model when Fb is the uniform distribution. This is indeed an easy consequence 
of the above lemma. A few remarks are in order. 

Remark 2.3. If Fs is N^fis,^^) and Fb (7^ Fs) is N{fib,<7b) then it can be easily 
shown that the problem is identifiable if and only if as < o'b. Now consider a mixture 
of exponentials, i.e., Fs is E{as,as) and Fb (7^ Fs) is E{ab,ab), where E{a,a) is the 
distribution that has the density (1/cr) exp(— (x — a)/(7)l(^a,oo){x). In this case, the 
problem is identifiable if as > ab, as this implies the support of Fg is a proper subset 
of the support of Fb. But when as < ab, the problem is identifiable if and only if 
as < ab. 

Remark 2.4. It is also worth pointing out that even in cases where the problem is 
not identifiable the difference between the true mixing proportion a and the estimand 
ao may be very small. Consider the hypothesis test Ho : 6 = versus Hi : 6 for 
the model N{6, 1) with test statistic X . The density of the p-values under 9 is 

fg^p) = lg-me2/2^g-v^e2$-i(i-p/2) _^ gv^e2$-i(i-p/2)j^ 

where m is the sample size. Here feiX) = e~™^^/^ > 0, so the model is not identifiable. 
As Fb is uniform, it can be easily verified that ao = a — ainip fg{p) . However, since 
the value of fg is exponentially small in m, ao — a is very small. In many practical 
situations, where m is not too small, the difference between a and ao is negligible. 
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It should be noted that the problem may actually be identifiable if we have further 
restrictions on Fg, e.g., if we require Fg to be normal. 



2.3 Estimation of the mixing proportion ao 

Note that when 7 = 1, F^^^ = F„ = F^^^ where F^^^ and F^^^ are defined in (12. ip and 
using ( 12. 2p . respectively. Whereas, when 7 is much smaller than ao the regularisation 
of modifies it, and thus and F^^^ are quite different. We would like to compare 
the naive and isotonised estimators F^n ^In^ respectively, and choose the smallest 
7 for which their distance is still small. This leads to the following estimator of ao: 

«n = inf |7 e (0, 1] : 7 dr.{Fl^, F^^J " ' ^^'^^ 

where c„ is a sequence of constants and dn stands for the L2(F„) distance, i.e., if 
(yf, : M — 7- M are two functions, then 



dnig, h) = \l j iaix) - h{x)y rfF„(x). 

It is easy to see that 

rf„(F„, 7F.':„ + (1 - l)F,) = 7 d^iP:^^, Fl^). (2.6) 

For simplicity of notation, using (12. 6p . we define ^dn{F]^, Fsn) for 7 = as (i„(F„, Fh) = 
lim^^o^ ■jdn{Fs ^sn)- This convention is followed in the rest of the paper. 

The choice of c„ is important, and in the following we address this issue in detail. 
We derive conditions on c„ that lead to consistent estimators of ag. We will also show 
that particular choices of c„ will lead to lower confidence bounds for ao. 



2.4 Consistency of an 

In this section we prove the consistency of a„ through a series of elementary results 
that are proved in the Appendix. 

Lemma 2.5. For 1 > 7 > ao, 

7c?„(F,'^„,F,y <c?„(F,FO. 

Thus, 

7<i,.K„,F,-;„)"(°- (2.7) 

[ > 0, 7 - ao < 0, 

as n 00. 
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Lemma 2.6. The set 



A. 



J e [0,1]: J dn{F:^^,FlJ< 



c. 



■n 



■n 



is convex. Thus, An = [dn, 1]. 

Theorem 2.1. If Cn/ y/n — )• and Cn — )■ oo, then an — )■ ceo. 

The above resuh shows that for a broad range of choices of c„, our estimation 
procedure is consistent. 

2.5 Lower bound for aQ 

Our goal in this sub-section is to construct a finite sample lower confidence bound ai 
with the property 



for a specified confidence level (1 — /3) (0 < /3 < 1), that is valid for any n. Such a 
lower bound would allow one to assert, with a specified level of confidence, that the 
proportion of "signal" is at least ai. 

It can also be used to test the hypothesis that there is no "signal" at level /3 by 
rejecting when > 0. Note that it is easy to show that ao = if and only if Fi, = Fg. 
Thus, the hypothesis of no "signal" can be addressed by testing whether ao = or 
not. In fact, the above approach will lead to an exact lower confidence bound when 
ao = 0, i.e., P{q:l = 0) = 1 — /3. This is evident from the proof of the following 
Theorem 12.21 given in the Appendix. The methods of |GW04j and |MR06j usually 
yield conservative lower bounds. 

Theorem 2.2. Let Hn be the CDF of ^/ndn{¥n,F). Let cti be defined as in (1215]) 

with On defined as the (1 — P)-quantile of Hn. Then (12. 8 p holds. 

Note that Hn is distribution- free (i.e., it does not depend on Fg and F^) and can 
be readily approximated by Monte Carlo simulations using a sample of uniforms. For 
moderately large n (e.g., n > 500) the distribution Hn can be very well approximated 
by that of the Cramer-von Mises statistic, defined as 



P{ao > Ai) > 1 - /3 



(2.8) 




Letting Gn to be the CDF of y/nd{¥n, F), we have the following result. 
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Theorem 2.3. 

sup \Hn{x) — Gn{x)\ — )■ ttS 11 ^ OO. 



Hence in practice, for moderately large n, we can take Cn to be the (1 — /?)- 
quantile of Gn or its asymptotic limit, which are readily available (e.g., see |AD52] ). 
The asymptotic 95% quantile of Gn is 0.6792, and is used in our data analysis. 

2.6 A tuning parameter free estimator of aQ 

Point estimators of ao can be developed by choosing particular values for c^, e.g., in 
applications we may choose c„ to be the median of the asymptotic limit of Hn- In this 
sub-section we propose another method to estimate Oq that is completely automated 
and has better finite sample performance (see Section H]). We start with a lemma that 
describes the shape of our criterion function, and will motivate our procedure. 

Lemma 2.7. 7(i„(-FJ„, FJn) ^■^ ^ non-increasing convex function of ^ in (0,1). 
Writing 



7 L 7 V 7 

we see that for 7 > ao? the second term in the RHS is a DF. Thus, for 7 > ao, 
F]^^ is very close to a DF, and hence F]^^ should also be close to F]^^. Whereas, for 
7 < ao, -F7„ is not close to a DF, and thus the distance 7(i„(_F7„, /^n) appreciably 
large. Thus at ao, we have a "regime" change: •jdniFsn^^sn) should have a slowly 
non- increasing segment to the right of ao and a steeply non-increasing segment to 
the left of ao. Figure ([1]) shows two typical such plots of the function 7 dn{F]^, F]n)-: 
where the left panel corresponds to a mixture of A^(2, 1) with A^(0, 1) (setting I) and 
in the right panel we have a mixture of Beta{l, 10) and uniform f/(0, 1) (setting II). 
In both the settings we have used ao = 0.1 and n = 5000. We will use these two 
settings to illustrate our methodology in the rest of this section and also in Section 

Using the above heuristics, we can see that the "elbow" of the function should 
provide a good estimate of ao; it is the point that has the maximum curvature, i.e., 
the point where the second derivative is maximum. 

In the above plots we have used numerical methods to approximate the second 
derivative of 7(i„(F^\, FJn) (using the method of double differencing). We advocate 
plotting the function ^dn{F]^^,F'J^^ as 7 varies between and 1. In most cases, a 
plot similar to Figure fllbP would immediately convey to the practitioner the most 
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Figure 1: Plot of 7 dn{F]^,F],^ (in solid blue) overlaid with its (scaled) second 
derivative (in dashed red) for n = 5000 (in solid blue) for setting I (left panel) and 
setting II (right panel). 

appropriate choice of dj], the estimator of ao- In some cases though, there can be 
multiple peaks in the second derivative, in which case some discretion on the part of 
the practitioner might be required. It must be noted that the idea of finding the point 
where the second derivative is large to detect an "elbow" or "knee" of a function is 
not uncommon; see e.g., |SC04] . In our simulation studies we have used this method 
to estimate ao- 

2.7 Estimation of Fs 



Figure 2: Plot of the estimates F°;j(x) (in dotted red), Fj^^{x) (in solid blue) and Fg 
(in dashed black) for setting I (left panel) and setting II (right panel). 

Let us assume for the rest of the section that the model f 1 1.11) is identifiable, i.e., 
a = ao- Once we have obtained a consistent estimator (which may or may not be 



9 



OLn as discussed in the previous sections) of ao, a natural nonparametric estimator of 
Fg is -Fg"^, defined as the minimiser of (I2.2p . In the following we show that, indeed, 
is consistent (in the sup- norm) for estimating Fg. 

p 

Theorem 2.4. Suppose that — )■ ao- Then, as n oo, 

sup\F:-{x)-Fg{x)\^0. 

An immediate consequence of Theorem 12.41 is that dn{F^'i,F^'i) — )■ as n — )■ cxd. 
Figure ([2]) shows our estimator F"^ along with the true Fg for the same data sets 
used in Figure ([1]). 

2.8 Estimating the density of Fg 

Suppose now that Fg has a density fg. Obtaining nonparametric estimators of fg 
can be difficult, and especially so in our set-up, as it requires smoothing and usually 
involves the choice of tuning parameter (s) (e.g., smoothing bandwidths). 

In this section we describe a tuning parameter free approach to estimating fg, 
under the additional assumption that fg is a non-increasing density. The assump- 
tion that fg is non-increasing, i.e., Fg is concave on its support, is natural in many 
situations (see Section |3] for an application in the multiple testing problem) and has 
been investigated by several authors, including |Gre56] , |WS93] , |LLF05] and |GW04] . 
Without loss of generality, we assume that fg is non-increasing on [0, oo). 

For a bounded function g : [0, oo) — M, let us represent the least concave majorant 
(LCM) of g by LCM[g]. Thus, LCM[g] is the smallest concave function that sits 
above g. Define Ft„ := LCM[Ff;j]. Note that Ft„ is a valid DF. We can now estimate 
fs by where /j^ is the piece-wise constant function obtained by taking the left 
derivative of Fj',^. In the following we show that both Fl^^ and are consistent 
estimators of their population versions. 

p 

Theorem 2.5. Assume that Fg{Q) = and that Fg is a concave on [0,oo). If ctn 

ao, then, as n ^ oo, 

sup|Ft„(a;)-F,(x)|4 0. (2.9) 
Further, if for any x > 0, fs{x) is continuous at x, then, as n ^ oo. 
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Figure 3: Plot of the estimate /J„(a;) (in solid red) and fs (in solid blue) for setting 
I (left panel) and setting II (right panel). 

Computing F^^ and /j„ are straightforward, an application of the PAVA gives 
both the estimators; see e.g., |RWD88j and |GW84j . Figure shows the LCM F^^ 
whereas Figure ([3]) shows its derivative along with the true density fs for the 
same data sets used in Figure ([H). 

Another alternative procedure for estimating Fs and fs, that will again crucially 
use estimation under shape constraints, as in f l2.ip . is provided below, and involves 
solving an optimisation problem. For a fixed 7, consider minimising fl2.2p where W 
is now restricted to the class of all DFs with F{0) = that are concave on [0, 00). 
The new estimator F^^ can be taken as the piece-wise linear concave function such 
that F7„(X(j)) = 9i where 

= (^i,...,^„) =arg min 116 -Vf 

where V = (V^i, V2, . . . , K), Vi := FJJX^,)), z = 1, 2, . . . , n, = 0i„, n &,on, with 
Qinc as in (12 .Sp and 



-'^(l) -^(2) — -'^(l) -^(3) — ^{2) X{n) — 

The estimator 6 is uniquely defined as it is the L2 projection of V on a closed convex 
cone in and can be easily computed using any standard optimisation toolbox (e.g., 
the cvx package in MATLAB; see http://cvxr.eom/cvx/p. Note that Qcon guarantees 



that the fitted 6 will be the evaluation of a concave function on [0, 00). The plot of 
'ydn{Fsny^sn) ^au again be used, as before, to choose an estimator a„ of oq. This 
would then naturally yield estimators of Fg and fs, namely, F"^ and its left-hand 
derivative f"^, respectively. However, due to space constraints, we do not pursue this 
alternative procedure in this paper. 
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3 Multiple testing problem 



The problem of estimating the proportion, a, of true null hypotheses is of interest in 
situations where a large number of hypotheses tests are performed. Recently, various 
such situations have arisen in applications. One major motivation is in estimating 
the proportion of genes that are not differentially expressed in deoxyribonucleic acid 
(DNA) microarray experiments. However, estimating the proportion of true null 
hypotheses is also of interest, for example, in functional magnetic resonance imaging 
(e.g., |TSS01] ) and source detection in astrophysics (e.g., [MGN+Ol] ). 

Suppose that we wish to test n null hypothesis i^oi, -f^02, • • • , -f^on on the basis of a 
data set X. Let Hi denote the (unobservable) binary variable that is if i^oj is true, 
and 1 otherwise, i = 1, . . . , n. We want a decision rule V that will produce a decision 
of "null" or "non-null" for each of the n cases. Here X can be a 6033 x 102 matrix 
of expression values in the prostate data example (see Section [5] for more details; also 
see Section 2.1 of |EfrlO] ) giving rise to n p-values Xi,X2, . . . and P might be 
the rule that rejects Hoi if Xi < 0.001 and accepts Hoi otherwise. 

Our estimator of the mixing proportion can also be used to form the decision 
rule V. The traditional measure of error in this context is the familywise error 
rate (FWER). This is defined as FWER = Prob (# of false rejections > 1), the 
probability of committing at least one type I error. But to control FWER, i.e., to 
guard against any single false positive occurring, is often too strict and will lead to 
many missed findings. In their seminal work |BH95] , the authors argued that a better 
quantity to control is the false discovery rate (FDR), defined as the expectation of 
the proportion of false rejections; more precisely. 



where V is the number of false rejections and R is the number of total rejections. They 
also described a method to control FDR, at level /3, using the following strategy: reject 



above procedure guarantees FDR < Poq. When oq is significantly smaller than 1 an 
estimate of can be used to yield a procedure with FDR approximately equal to (3 
and thus will result in an increased power. This is essentially the idea of the adapted 
control of FDR (see |BHOO] ). See |Sto02] . |Bla04] and |LLF05] for a discussion on the 
importance of efficient estimation of and some proposed estimators. 

Our method can be directly used to yield a consistent estimator of that does 
not require the specification of any tuning parameter, as discussed in Section 12.61 We 




the null hypotheses corresponding to the (ordered) p- values p(i),p(2), where 
k = max{k : < f3k/m}. In fact, under identifiability, it can be shown that the 
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can also obtain a completely nonparametric estimator of Fg, the distribution of the 
p-values arising from the alternative hypotheses. Suppose that Fj, has a density fb 
and Fg has a density fs- To keep the following discussion more general, we allow fb to 
be any known density, although in most applications in the multiple testing set-up we 
will take fb to be the uniform distribution f/(0, 1). For identifiability in this set-up, 
if Fb is taken to be the uniform distribution on (0, 1), we only need to assume that 
infa;g[o,i] fs{x) = 0; see Lemma [221 This is indeed the standard assumption made in 
the literature; see e.g., |NM11] . 

The local false discovery rate (LFDR) is defined as the function / : (0, 1) — )■ [0, oo), 
such where 

= P(H. = oix. = .) = 

and f{x) = aofs{x) + (1 — o:o)fb{x) is the density of the observed p-values. The esti- 
mation of the LFDR / is important because it gives the probability that a particular 
null hypothesis is true given the observed p- value for the test. The LFDR method 
can help us get easily interpretable thresholding methods for reporting the "interest- 
ing" cases (e.g., l{x) < 0.20); see Section 5 of |EfrlO] . Obtaining good estimates of / 
can be tricky as it involves the estimation of an unknown density, usually requiring 
smoothing techniques; see Section 5 of |EfrlO] for a discussion on estimation and in- 
terpretation of /. From the discussion in Section [2?71 under the additional assumption 
that fs is a non-increasing density, we have a natural tuning parameter free estimator 
/ of the LFDR: 

f(x) = (1 -6n)/fe(x) 

anfln{x) + (1 - an)fb{x) ' 

for X G (0, 1). The assumption that fs is non- increasing, i.e., Fs is concave, is quite 
intuitive and natural - when the alternative hypothesis is true the p- value is generally 
small - and has been investigated by several authors, including |GW04] and |LLF05] . 

4 Simulation 

To investigate the finite sample performance of the estimators developed in this paper 
we carry out a few simulation experiments. We also compare their performance with 
other existing methods. 

4.1 Lower bounds for 

Although there has been some work on the estimation of in the multiple testing 
setting, |MR06] and |GW04] are the only papers we found that discuss methodology 
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Table 1: Coverage probabilities of nominal 95% lower confidence bounds for the three 
methods when n = 1000. 



a 


Setting I 


Setting II 


&l 






&l 









0.95 


0.98 


0.93 


0.95 


0.98 


0.93 


0.01 


0.97 


0.98 


0.99 


0.97 


0.97 


0.99 


0.03 


0.98 


0.98 


0.99 


0.98 


0.98 


0.99 


0.05 


0.98 


0.98 


0.99 


0.98 


0.98 


0.99 


0.10 


0.99 


0.99 


1.00 


0.99 


0.98 


0.99 



Table 2: Coverage probabilities of nominal 95% lower confidence bounds for the three 
methods when n = 5000. 



a 


Setting I 


Setting II 








«i 









0.95 


0.97 


0.93 


0.95 


0.97 


0.93 


0.01 


0.98 


0.98 


0.99 


0.98 


0.98 


0.99 


0.03 


0.98 


0.98 


0.99 


0.98 


0.98 


0.99 


0.05 


0.99 


0.99 


0.99 


0.98 


0.98 


0.99 


0.10 


0.99 


0.99 


1.00 


0.99 


0.98 


0.99 



for constructing a lower confidence bound for oq- These procedures are intellectually 
connected and the methods in |MR06] are extensions of those proposed in |G W04j . 
The lower bound cxl proposed in both the papers approximately satisfies (12. 8p and 
has the form 

¥n{t) - t - 7]n,l3S{t) 
OtL = sup , 

te(o,i) -L - r 

where r]n,i3 is a bounding sequence for the bounding function 6{t) at level /3 (see 
|MR06j ). A constant bounding function, 6(t) = 1, is used in [GW04] with rjn^/s = 
V'log(2//3)/2n, whereas |MR,06] suggest a class of bounding functions but observe 
that the standard deviation-proportional bounding function 5{t) = \/t{l — t) has op- 
timal properties among a large class of possible bounding functions. We have used 
this bounding function and a bounding sequence suggested by the authors. We de- 
note the lower bound proposed in |MR06] as af^^, the bound in |GW04] as and 
the lower bound in Section [2751 by To be able to use the methods of |MR06] and 
[GW04j in setting I we had to transform the data such that is uniform U{0, 1). 
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We take a G {0,0.01,0.03,0.05,0.10} and compare the performance of the three 
lower bounds in the two different simulation settings discussed in Section 12.61 For 
each setting we have used a sample size (n) of 1000 and 5000. We present the 
estimated coverage probabilities, obtained by averaging over 5000 simulations, of the 
lower bounds for the different settings in Tables 1 and 2. We can immediately see 
from the tables that the bounds are usually quite conservative. However, it is worth 
pointing out that when = 0, our method has exact coverage, as discussed in 
Section 12.51 Also, the fact that our procedure is simple, easy to implement, and 
completely automated, makes it very attractive. 



4.2 Estimation of aQ 




Figure 4: Density plots of the estimators of a (a G {.01, .03, .05, .10}): (in solid 
red), (in dash-dotted purple), a^^ (in dashed black), cyq (in dashed green) 

and dg (in solid blue). The vertical line (in doted black) denotes the true mixing 
proportion. 

In this sub-section we consider the estimation of Oq. We compare the performance 
of our estimator, proposed in Section 12. 6[ with four other estimators available in the 
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Table 3: Mean and RMSE of the five estimators discussed in Section 



a 


Mean 


RMSE 




"o 


^MR 








^GW 


^MR 






0.01 


0.01 


0.00 


0.00 


0.03 


0.02 


0.01 


0.01 


0.01 


0.04 


0.01 


0.03 


0.03 


0.01 


0.01 


0.06 


0.04 


0.01 


0.03 


0.02 


0.05 


0.01 


0.05 


0.05 


0.02 


0.03 


0.08 


0.06 


0.01 


0.03 


0.02 


0.05 


0.01 


0.10 


0.09 


0.06 


0.07 


0.12 


0.10 


0.01 


0.04 


0.03 


0.04 


0.01 



literature. |Sto02] proposed an estimate of which we denote by . Due to space 
constraints we do not discuss the estimation procedure in [Sto02j . but we would like to 
mention that he uses bootstrapping to choose the tuning parameter involved. |LLF05] 
proposed an estimator which is tuning parameter free but crucially uses the known 
shape constraint of a convex and non- increasing js\ we denote it by dp . We also use 
the estimator proposed in |MR06j for two bounding functions (5(t) = \/t{\ — t) and 
6{t) = 1). For its implementation we have to choose a sequence {/3n} going to zero 
as n — )• oo. |MR06] did not specify any particular choice of {/?„} but required the 
sequence satisfy some conditions. We chose f3n = f3/y/n, where f3 = 0.05. We denote 
the estimator proposed in |MR06] by a^^^ when 6{t) = \/t{l — t) and by when 
5{t) = 1. We denote our estimator by dg. 

We use the same simulation setting as in |LLF05] . A total of = 5000 fea- 
tures were simulated for each J = 10 samples. Let these random variables be 
denoted by Xij , i = 1, . . . ,n, j = 1, . . . , J, and the corresponding realizations 
Xij. Let Xj = {Xij, X2j, . . . , Xnj), and assume that each Xj ~ N{^nxi, Inxn), and 
that Xi, X2, . . . , Xj are independent. We test Hoi : /ij = versus Hoi : fii 
for each i, and calculate a two-sided p-value pi based on a one-sample t-test with 
Pi = 2P(Tj_i > \xil ^Sil.]\). Herexi = Yfj=i^ij/J and = Yfj=i{^ij- ^iY I {J 
are the sample mean and variance, respectively, and Tj_i is a random variable having 
t-distribution with J — 1 degrees of freedom. 

Four different choices of a are considered, namely 0.01,0.03,0.05,0.10. The /ijS 
were set to zero for the true null hypotheses, whereas for the false null hypotheses they 
were drawn from a symmetric bi-triangular density with parameters a = log2(1.2) = 
0.263 and b = log2(4) = 2; see page 568 of |LLF05] for the details. We drew N = 5000 
sets of J = 10 independent 5000-dimensional vectors from the multivariate Gaussian 
distribution N{^nxi, Inxn) and calculated the corresponding 5000 sets of vectors of 
p- values. 
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The mixing proportion a is estimated, using the five different estimates described 
above, for each set of p-values, and the empirical kernel density of the estimates are 
shown in Figure (jl]), for the different choice of a. In Table 3 we give the mean of the 
5000 estimates of the mixing proportion for the five methods along with their root 
mean squared error (RMSE). It is clearly evident that our procedure has the least 
RMSE and the minimum bias. 

5 Real data analysis 
5.1 Prostate data 



0,035 




(a) Histogram of the p- values. (b) Plot of 7d„(^2„, (in solid blue). 

Figure 5: The horizontal line (in solid black) in the left panel indicates the ?7(0, 1) 
distribution. The vertical line (in dotted black) in the right panel indicates the point 
of maximum curvature (Ao). 

Genetic expression levels for n = 6033 genes were obtained for m = 102 men, 
mi = 50 normal control subjects and m2 = 52 prostate cancer patients. Without 
going into the biological details, the principal goal of the study was to discover a 
small number of "interesting" genes, that is, genes whose expression levels differ 
between the cancer and control patients. Such genes, once identified, might be further 
investigated for a causal link to prostate cancer development. The prostate data is 
a 6033 X 102 matrix X having entries Xij = expression level for gene i on patient j, 
i = 1,2, . . . ,n, and j = 1, 2, . . . , m, with j = 1,2,..., 50, for the normal controls and 
j = 51,52, . . . , 102, for the cancer patients. Let Xj(l) and Xj(2) be the averages of 
Xij for the normal controls and for the cancer patients, respectively, for gene i. The 
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two-sample t-statistic for testing significance of gene i is 

Si 

where Si is an estimate of the standard error of — Xj(2), i.e., = (1/50 + 
1/52)E -Ij^.. - ^.(1)}^ + - :^.(2)}V100. 




Figure 6: The left panel shows the estimates F^^ (in dotted red) and (in solid 
blue). The right panel shows the density 

If we only had data from gene i to consider, we could use tj in the usual way to 
test the null hypothesis Hoi: gene i has no effect, i.e., Xij has the same distribution 
for the normal and cancer patients; rejecting ifo? if ti looked too big in absolute value. 
The usual 5% rejection criterion, based on normal theory assumptions, would reject 
Hoi if \ti\ exceeded 1.98, the two-tailed 5% point for a Student-t random variable with 
100 degrees of freedom. 

We will work with the p-values instead of the "t-values" as then the distribution 
under the alternative will have a non-increasing density which we can estimate using 
the method developed in Section 12. 7[ We have plotted the histogram of the p- values 
in Figure (l5all . Figure (l5b|) shows the plot of 7(i„(FJ„, FJ^), as 7 varies from to 1, 
along with our estimator dg, which turns out to be 0.0877. The lower bound d° for 
this data is found to be 0.0512. The other estimates perform similarly except the one 
proposed by [LLFOSj . which does not detect any "signal". In Figure we plot the 
estimate of the distribution of the p- values under the alternative Fs,n{x), and its LCM 
F}^{x), along with the estimate of the density fs, found using the theory developed 
in Section 12.71 In the left panel of Figure we plot the estimated LFDR / for this 
data set. 
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5.2 An application in astronomy 




0.7 O.S 



Figure 7: The left panel shows the plot of the estimated LFDR I for p-values less 
than 0.05 for the prostate data. The right panel shows the plot of 7(i„(F7n5 ^sn) (i^ 
solid blue) overlaid with its (scaled) second derivative (in dashed red). 




Figure 8: In the left panel we have the density of the radial velocity of the contam- 
inating stars overlaid with the (scaled) kernel density estimator of the Carina data 
set. The right panel shows the nonparametric estimator Fs,n (in dashed red) overlaid 
with the closest Gaussian distribution (in solid blue). 

In this sub-section we analyse the radial velocity (RV) distribution of stars in 
Carina, a dwarf spheroidal (dSph) galaxy. The dSph galaxies are low luminosity 
galaxies that are companions of the Milky Way. The data have been obtained by 
Magellan and MMT telescopes (see |WMO"'"07] ) and consist of radial (line of sight) 
velocity measurements for n = 1215 stars from Carina, contaminated with Milky Way 
stars in the field of view. We would like to understand the distribution of the line of 
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sight velocity of stars in Carina. For the contaminating stars from the Milky Way in 
the field of view, we assume a non-Gaussian velocity distribution Ff, that is known 
from the Besancon Milky Way model ( jRRDP03] ). calculated along the line of sight 
to Carina. 

Our estimate of ao for this data set turns out to be 0.356, while the lower 
bound for is found to be 0.322. The right panel of Figure ([7]) shows the plot 
of 'ydn{F^ny ^sn)y 7 varies from to 1, along with the estimated oq. The right 
panel of Figure (E]) shows the estimate of Fs and the closest (in terms of minimising 
the L2{Fs,n) distance) fitting Gaussian distribution. Astronomers usually assume the 
distribution of the radial velocities for these dSph galaxies to be Gaussian in nature. 
Indeed we see that the estimated Fg is close to a normal distribution (with mean 222.9 
and standard deviation 7.51), although a formal test of this hypothesis is beyond the 
scope of the present paper. The left panel of Figure ([8]) shows the density of the 
original data and the known /b, obtained from the Besancon Milky Way model. 

6 Concluding remarks 

In this paper we have developed a procedure for estimating the mixing proportion 
and the unknown distribution in a two component mixture model using ideas from 
shape restricted statistical inference. It should be noted that the methods developed 
in |GW04] . |MR06] and |LLF05j for estimating and fg under the multiple testing 
setting, can, in fact, be generalized to handle situations where is not the uniform 
distribution by transforming the observed XjS to Yi := F^~^(Xj) so that the "back- 
ground" distribution of Yi becomes uniform on (0, 1). However, apart from the fact 
that the methods developed in this paper require minimal assumptions, use different 
techniques and have better finite sample performance, the main advantage of our pro- 
cedure is that we do not have to choose any tuning parameter for its implementation. 

We have established the consistency properties of the estimators developed in the 
paper. However, nothing is presently known about the rates of convergence of d„ 
and the estimators of Fg. Construction of confidence intervals for can be carried 
out if we can find the limiting distribution of d„. It must be mentioned here that 
investigating such asymptotic properties of these estimators is expected to be a hard 
exercise and will be a topic of future research. As we have observed in the astronomy 
application, goodness-of-fit tests for Fg are important - it can guide the practitioner 
to use appropriate parametric models for further modelling and study. However a 
formal goodness-of-fit test is beyond the scope of the present paper. 
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A Appendix 1 



A.l Proof of Lemma [272] 

Suppose that ao < a. Then there exists a* G {ao,a) such that [F — (1 — a*)Fb]/a* 
is a vahd DF. Using the fact that F = aFg + (1 — a)Fh and letting rj := a/a* > 1, 
we see that := rjFs — {rj — 1)-Ff, must be a vahd DF. For to be non-decreasing, 
we must have T]fs{x) — [f] — l)fb{x) > for all x G M. This implies that we must 
have fs{x) > (1 — l/r])fb{x) for all a; G M, which completes the argument. Retracing 
the steps backwards we can see that if for some c > (which necessarily has to be 
less than 1) fs{x) > cfi,{x), for all x G M, then there exists a* := a{l — c) for which 
[F — {1 — a*)Fh]/a* is a valid DF. Now, from the definition of ao, it follows that 
ao < a* < a. 

A. 2 Proof of Lemma 12.51 

Letting 

F-(1-7)F, 
7 

observe that 

Also note that F^ is a valid DF for 7 > ao- As F^n is defined as the function that 
minimises the L2(F„) distance of F7„ over all DFs, 

7 dniF^^^^F^J < 7 dniP^^^^F^) = rf„(F,F„). 

To prove the second part of the lemma, notice that for 7 > ao the result follows 
from above and the fact that dn{F, F„) as n — )■ 00. 

For 7 < ao, F^ is not a valid DF, by the definition of ao. Note that as n ^ 00, 
Ps,n point-wise. So, for large enough n, F]^^^ is not a valid DF, whereas F]^^ is 

always a DF. Thus, dn{F]^,F]^ converges to something positive. 

A. 3 Proof of Lemma 12.61 

Assume that 71 < 72 and 71,72 G An- If 73 = rj'~^i + (1 — ^7)72, for < < 1, it is 
easy to observe from (12.11) that 

ri{liF]^n) + (1 - V){12F]%) = ^^F]%. 
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Note that [?7(7i-f7n) + ~ ''7)(72-^7n)]/73 a valid DF, and thus from the definition 
of FJn) have 



r7(7im + (l-r/)(72ra 



73 

^(7li^.Tn) + (1 - ^)(72A'!?.) r7(7lF,':n) + (1 - ^)(72i^^„: 



n 



73 73 



< — rfnlF,-^;., FJ^J + ^^^^d^{F]l, F]X) (A.l) 

73 73 

where the last step follows from the triangle inequality. But as 7i, 72 € the above 
inequality yields 

'^n\J^ s.w'^ s,n) — . /— . "T 



73 v^7i 73 aA72 A/^73' 

Thus 73 G 

A. 4 Proof of Theorem 12.11 

We need to show that P(|d„ — OqI > e) for any e > 0. Let us first show that 

P{6in — ao < — e) — )• 0. 

The statement is obviously true if ao < e. So let us assume that > e. Suppose 
dn — ao < i-e., d„ < ao — e- Then by the definition of q;„ and the convexity of 
An, we have (ao — e) G v4„ (as An is a convex set in [0, 1] with 1 E An and d„ G v4„), 
and thus 

dn{F:o^-% F:D < -TT^ (A.2) 



But by (12 .yp the L.H.S. of (1A.2I) goes to a non-zero constant in probability. Hence, if 
^ ^ 0, 

P{&n -ao<-e)<P f rf„(F,X^ ^.T') ^ ^r^^ ^ 0- 

V A/n(ao-e)/ 

This completes the proof of the first part of the claim. 
Now suppose that d„ — ao > e. Then, 

an-ao>e rf„(F,"«+^ > 

dn(¥n,F) > Cn. 
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The first implication follows from definition of an-, while the second implication is 
true by Lemma [2.51 The R.H.S. of the last inequality is (asymptotically similar to) 
the Cramer-von Mises statistic for which the asymptotic distribution is well-known 
and thus if c„ — )■ oo then the result follows. 

A.5 Proof of Theorem [2721 

Note that 

= P(v^rf„(F„,F) > 
= /3, 

where we have used the fact that aodn^F^n^ ^s"^) = dni¥n, F). 
A.6 Proof of Theorem [2731 

It is enough to show that sup^ \Hn{x) — G{x)\ — i- 0, where G is the limiting distribu- 
tion of the Cramer-von Mises statistic, a continuous distribution. As sup^ |G'„(x) — 
G{x)\ — > 0, it is enough to show that 

V^dn{¥n, F) - v^rf(F„, F) 4 0. (A. 3) 



We now prove flA.3p . Observe that 

n{dl - d2)(F„, F) = v/^(E„ - P)[g^] = z/„(^„), (A.4) 

where (jn = \/n(¥n — P)^, Pn denotes the empirical measure of the data, and Un := 
^/n(Fn — P) denotes the usual empirical process. We will show that Unign) — > 0, 



which will prove (lA.4p . 



For each positive integer n, we introduce the following class of functions 

g^(n) = j y/^{H -Ff -.H isa vahd DF and sup \H{t) - F{t)\ < ^ 1 . 
t tm J 



Let us also define 

D„ :=sup v^|F„(t) 
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From the definition of cjn and D"^, we have ^„(t) < -^D"^, for all t G M. As Dn = 
Op(l), for any given e > 0, there exists c > (depending on e) such that 

P{9n i Qcin)} = P{v^sup \gn{t)\ > c^} = P{Dl > c') < e, (A.5) 

t 

for all sufficiently large n. Therefore, for any 6 > 0, 

P{Wnign)\>S} = P{Wnign)\> 5,gnegc{n)} + P{\Ur^igr^)\> S,gn^Gc{n)} 

< P{Wn{gn)\ >S,gne + ^ Qcin)} 

< p\ sup \unig)\ >s\+ P{gn i Qc{n)} 

< -E{ sup Wn{gn)\}+P{gn^Qc{n)} 



S 



geGcin) 





2 

where Gc{n) := ^ is an envelope for Qd^) and J is a constant. Note that to derive 
the last inequality we have used the maximal inequality in Corollary (4.3) of Pollard 
(1989); the class ^c('^) is "manageable" in the sense of jPol8 9] (as a consequence of 
equation (2.5) of [VdOnn] ). 

Therefore, for any given 6 > and e > 0, for large enough n and c > we can 
make both Jc^/{6n) and P{gn ^ Qc{n)} less than e, using flA.51) and flA.6p . and thus, 
P{Wn{gn)\ > 5} < 2e. The result now follows. 

A. 7 Proof of Lemma 12.71 

Let < 7i < 72 < 1. Then, 

72 d^iP:^^, F]l) < 72 dniP:^^, ilih2)F:^n + (1 " Ti/Ts)^.) 

= rf„(7i^.':^ + (72 - li)F,, ^,F:1^ + (72 - li)F,) 

^ 7l dn{Ps^nJ Fg^j^) , 

which shows that •ydn^F^^, F^^) is a non-increasing function. 

To show that 'jdniP^^n^ P^^n) is convex, let < 71 < 72 < 1 and 73 = r]'ji + {l—r])'y2, 
for < ?7 < 1. Then, by ( 1A.1|) we have the desired result. 

A. 8 Proof of Theorem [23] 

Note that from (l21il . 
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for all X G M. Thus we can bound F°'^{x) as follows: 

where Z)„ = sup^gjg |F„(x) — F{x)\, and both the upper and lower bounds are non- 
decreasing functions in x. Thus, from the characterisation of F^^ and properties 
of isotonic estimators (see e.g., Theorem 1.3.4 of |RWD88] ). we know that for all 
i = 1,2,. ..,n, 

—Fs{X,) : ^ < < —Fs{Xi) + : + — . 

Therefore, for all z = 1, 2, . . . , n, 

Ian — tin I -Dn P 

< 2^^ — ^ + ^4o, 

p 

as n — )■ oo, using the fact «„ — )■ G (0, 1). As the XjS are dense in the support of 
F, we have the desired result. 

A.9 Proof of Theorem [231 

Let e„ := sup^.^^ \F"j^{x) — Fs{x)\. Then the function Fs + e„ is concave on [0, oo) 
and majorises -F^"^. Hence, for all x G [0, oo), F^!^[x) < Fl^^{x) < Fs{x) + e„, as 
is the LCM of Thus, 

-en < Ff-{x) - F,{x) < Ft„(x) - F.,{x) < e„, 

and therefore, 

sup \Fl^{x) - Fs{x) \ < tn. 

p 

By Theorem 12.41 as e„ — 0, we must also have (12. 9p . 

The second part of the result follows immediately from the lemma is page 330 of 
|RWD88j ■ and is similar to the result in Theorem 7.2.2 of that book. 
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